Combining EBMT, SMT, TM and IR Technologies for Quality and Scale
نویسندگان
چکیده
In this paper we present a hybrid statistical machine translation (SMT)-example-based MT (EBMT) system that shows significant improvement over both SMT and EBMT baseline systems. First we present a runtime EBMT system using a subsentential translation memory (TM). The EBMT system is further combined with an SMT system for effective hybridization of the pair of systems. The hybrid system shows significant improvement in translation quality (0.82 and 2.75 absolute BLEU points) for two different language pairs (English–Turkish (En–Tr) and English– French (En–Fr)) over the baseline SMT system. However, the EBMT approach suffers from significant time complexity issues for a runtime approach. We explore two methods to make the system scalable at runtime. First, we use an heuristic-based approach. Secondly, we use an IR-based indexing technique to speed up the time-consuming matching procedure of the EBMT system. The index-based matching procedure substantially improves run-time speed without affecting translation quality.
منابع مشابه
Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting
In this paper, we address the issue of applying example-based machine translation (EBMT) methods to overcome some of the difficulties encountered with statistical machine translation (SMT) techniques. We adopt two different EBMT approaches and present an approach to augment output quality by strategically combining both EBMT approaches with the SMT system to handle issues arising from the use o...
متن کاملHybrid Example-Based SMT: the Best of Both Worlds?
(Way and Gough, 2005) provide an indepth comparison of their Example-Based Machine Translation (EBMT) system with a Statistical Machine Translation (SMT) system constructed from freely available tools. According to a wide variety of automatic evaluation metrics, they demonstrated that their EBMT system outperformed the SMT system by a factor of two to one. Nevertheless, they did not test their ...
متن کاملHybridity in MT: Experiments on the Europarl Corpus
(Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based SMT system trained on reasonably large data sets. (Groves & Way, 2005) take this a stage further and demonstrate that while the EBMT system also outperforms a phrase-based SMT (PBSMT) system, a hybrid ‘example-based SMT’ system incorporating marker chunks and SMT sub-sentential alignment...
متن کاملHybridity in MT
(Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based SMT system trained on reasonably large data sets. (Groves & Way, 2005) take this a stage further and demonstrate that while the EBMT system also outperforms a phrase-based SMT (PBSMT) system, a hybrid ‘example-based SMT’ system incorporating marker chunks and SMT sub-sentential alignment...
متن کاملMaximising TM Performance through Sub-Tree Alignment and SMT
With the steadily increasing demand for highquality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, in particular focusing on the use of high-quality Statistical Machine Translation (SMT) supplementing the established Translation Memory (TM) technology. In this paper, we present a novel modular approach that utilises sta...
متن کامل